There are 3 main families of visualization functions:
plot() – See ?plot?lattice::Lattice?ggplot2::ggplot2Basic plot syntax:
plot(x , y) x: vector for x axis, y: vector for y axis
See ?plot
x <- 1:10
y <- 1:10
plot(x, y)
irisplot(iris$Sepal.Width, iris$Sepal.Length)
irishist(iris$Sepal.Width)
par() to plot multiple plotspar(mfrow=c(1,2))
plot(iris$Sepal.Width, iris$Sepal.Length)
hist(iris$Sepal.Width)
plot() vs ggplot()
- A picture is worth a thousand words – when the picture is good
ggplot()ggplotly()
- A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.
- You provide the data, tell
ggplot2how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.- Infinite options for the ultimate customization
- It is part of the tidyverse, a collection of R packages that share common philosophies and are designed to work together.
# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just ggplot2:
install.packages("ggplot2")
# Don't forget to load tidyverse to your environment
library(tidyverse)
# Or just ggplot2
library(ggplot2)
ggplot(),
aes().geom_ functions.scale_ or labs() and lims() functions.facet_ functionscoord_ functionsirisiris datahead(iris, 3)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
aes()p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p
geom_p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point()
colorp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species))
color + sizep <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length))
color + size + alpha (transparency)p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width))
color + size + alpha + shapep <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species))
p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species)) +
guides( color=guide_legend(ncol = 3, byrow = TRUE),
size=guide_legend(ncol = 3, byrow = TRUE),
alpha=guide_legend(ncol = 3, byrow = TRUE))
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) +
geom_smooth()
What will this give me?
Ooops! What happened??
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) +
geom_smooth()
Why did this work now?
Can you see the difference?
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + geom_smooth(aes(color=Species))
What about this? What’s happening here?
Now the color is only defined in the geom_smooth and not for geom_point
Let’s generate a hypothetical iris with some added ecosystem type and precipitation data.
Here I am using the sample() function which allows me to randomly pick variables from a vector.
I assign the 150 variables I want to pick with size = 150 and replace=TRUE allows me to pich these variables multiple times.
Then I assign these to my new dataset iris2 with column names Ecosystem and Precipitation
ecosys <- sample(c("Forest", "Riparian", "Urban"), size = 150, replace = T)
precp <- sample(c("Heavy", "Mild"), size = 150, replace = T)
iris2 <- cbind(iris, Ecosystem=ecosys, Precipitation=precp)
head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Ecosystem
## 1 5.1 3.5 1.4 0.2 setosa Riparian
## 2 4.9 3.0 1.4 0.2 setosa Urban
## 3 4.7 3.2 1.3 0.2 setosa Riparian
## 4 4.6 3.1 1.5 0.2 setosa Urban
## 5 5.0 3.6 1.4 0.2 setosa Riparian
## 6 5.4 3.9 1.7 0.4 setosa Urban
## Precipitation
## 1 Heavy
## 2 Heavy
## 3 Heavy
## 4 Mild
## 5 Heavy
## 6 Mild
iris2Now, I would like to see how my previous graph changes for the different types of ecosystem and precipitation.
This was the graph :
geom_smooth for now because I do not have enough data points for model prediction.alpha aesthetic to make it easier for us to see.p2 <- ggplot(data=iris2, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p2 <- p2 + geom_point(aes(size=Petal.Length)) # + geom_smooth()
p2
Now I add facets!
p2 + facet_grid(Ecosystem ~ Precipitation)
I can customize the facets very easily!
p2 + facet_grid( . ~ Precipitation)
p2 + facet_grid(Ecosystem ~ .)
p2 + facet_grid(Precipitation ~ .)
You get the idea here right?
wagesYou can use facet_wrap if you want to facet by just 1 variable but you want to organize them nicely.
First, let’s read the wages data in R.
wages <- read.csv("./wages.csv",
header = T,
stringsAsFactors = T)
head(wages, 3)
## earn height sex race ed age
## 1 79571.30 73.89 male white 16 49
## 2 96396.99 66.23 female white 16 62
## 3 48710.67 63.77 female white 16 33
Let’s create age categories with cut() function. I am turning the continous age varialbe into a categorical variable with cut() function. I am also setting categorical intervals increasing by 10 years with the breaks = seq(20, 100, by=10) argument. Then I assign this new variable to a new column called age_cat.
wages <- wages %>% mutate(age_cat = cut(age, breaks = seq(20, 100, by=10)) )
head(wages, 4)
## earn height sex race ed age age_cat
## 1 79571.30 73.89 male white 16 49 (40,50]
## 2 96396.99 66.23 female white 16 62 (60,70]
## 3 48710.67 63.77 female white 16 33 (30,40]
## 4 80478.10 63.22 female other 16 95 (90,100]
Let’s plot it
pw <- ggplot(wages, aes(x=height, y=earn)) +
geom_point(aes(size=ed), alpha=0.5)
pw
pw + facet_wrap(~age_cat)
Or you can specify the rows and columns for the faceting
pw + facet_wrap(~age_cat, ncol=5)
Plot the wages.csv data like the following